AITopics | action type

Collaborating Authors

action type

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix for: Barking up the right tree: an approach to search over molecule synthesis DAGs John Bradshaw

Neural Information Processing SystemsFeb-8-2026, 08:54:52 GMT

The advantage of the DAG formalism comes at generation time.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada (0.04)
Asia > South Korea > Gyeongsangnam-do > Changwon (0.04)
(2 more...)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Grounded ReinforcementLearning: LearningtoWintheGameunderHumanCommands SupplementaryMaterials

Neural Information Processing SystemsFeb-8-2026, 05:06:43 GMT

Inthis section, we describe the details ofMiniRTSEnvironment and human dataset. The data do not contain any personally identifiable information or offensivecontent. Figure 1: MiniRTS [2]implements the rockpaper-scissors attack graph, each army type has some units it is effective against and vulnerableto. "swordman","spearman"and"cavalry"allare effectiveagainst"archer" Figure 2: Building units can produce different army units using resources. Resource Units: Resource units are stationary and neutral.

catapult, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Industry: Government > Military > Army (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

See, Think, Act: Online Shopper Behavior Simulation with VLM Agents

Zhang, Yimeng, Gesi, Jiri, Xue, Ran, Wang, Tian, Wang, Ziyi, Lu, Yuxuan, Zhan, Sinong, Zeng, Huimin, Cui, Qingjun, Guo, Yufan, Huang, Jing, Shah, Mubarak, Wang, Dakuo

arXiv.org Artificial IntelligenceOct-23-2025

LLMs have recently demonstrated strong potential in simulating online shopper behavior. Prior work has improved action prediction by applying SFT on action traces with LLM-generated rationales, and by leveraging RL to further enhance reasoning capabilities. Despite these advances, current approaches rely on text-based inputs and overlook the essential role of visual perception in shaping human decision-making during web GUI interactions. In this paper, we investigate the integration of visual information, specifically webpage screenshots, into behavior simulation via VLMs, leveraging OPeRA dataset. By grounding agent decision-making in both textual and visual modalities, we aim to narrow the gap between synthetic agents and real-world users, thereby enabling more cognitively aligned simulations of online shopping behavior. Specifically, we employ SFT for joint action prediction and rationale generation, conditioning on the full interaction context, which comprises action history, past HTML observations, and the current webpage screenshot. To further enhance reasoning capabilities, we integrate RL with a hierarchical reward structure, scaled by a difficulty-aware factor that prioritizes challenging decision points. Empirically, our studies show that incorporating visual grounding yields substantial gains: the combination of text and image inputs improves exact match accuracy by more than 6% over text-only inputs. These results indicate that multi-modal grounding not only boosts predictive accuracy but also enhances simulation fidelity in visually complex environments, which captures nuances of human attention and decision-making that text-only agents often miss. Finally, we revisit the design space of behavior simulation frameworks, identify key methodological limitations, and propose future research directions toward building efficient and effective human behavior simulators.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.19245

Genre: Research Report > New Finding (0.46)

Industry:

Retail > Online (1.00)
Information Technology > Services > e-Commerce Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Lu, Dunjie, Xu, Yiheng, Wang, Junli, Wu, Haoyuan, Wang, Xinyuan, Wang, Zekun, Yang, Junlin, Su, Hongjin, Chen, Jixuan, Chen, Junda, Mao, Yuchen, Zhou, Jingren, Lin, Junyang, Hui, Binyuan, Yu, Tao

arXiv.org Artificial IntelligenceOct-23-2025

Training computer-use agents requires massive amounts of GUI interaction data, but manually annotating action trajectories at scale is prohibitively expensive. We present VideoAgentTrek, a scalable pipeline that automatically mines training data from publicly available screen-recorded videos at web scale, eliminating the need for manual annotation. Our approach addresses a key challenge: raw videos contain implicit demonstrations but lack explicit action labels. To solve this, we develop Video2Action, an inverse dynamics module (IDM) with two components: (1) a video grounding model that detects and localizes GUI actions with precise temporal boundaries and context, and (2) an action-content recognizer that extracts structured parameters like click coordinates and typed text with high fidelity. Applied to 39,000 YouTube tutorial videos, our pipeline generates 1.52 million interaction steps automatically. We leverage this data through continued pretraining followed by supervised fine-tuning. On OSWorld-Verified, our approach improves task success rates from 9.3% (SFT-only baseline) to 15.8%, a 70% relative improvement. On AgentNetBench, step accuracy increases from 64.1% to 69.3%. Our results demonstrate that passive internet videos can be transformed into high-quality supervision for computer-use agents, providing a scalable alternative to expensive manual annotation.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.19488

Genre:

Research Report > New Finding (0.86)
Instructional Material > Course Syllabus & Notes (0.68)

Industry: Education (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation

Qian, Yaoyao, Wang, Yuanli, Zhang, Jinda, Zong, Yun, Chen, Meixu, Zhou, Hanhan, Huang, Jindan, Zeng, Yifan, Hu, Xinyu, Song, Chan Hee, Zhang, Danqing

arXiv.org Artificial IntelligenceOct-23-2025

Current evaluation of web agents largely reduces to binary success metrics or conformity to a single reference trajectory, ignoring the structural diversity present in benchmark datasets. We present WebGraphEval, a framework that abstracts trajectories from multiple agents into a unified, weighted action graph. This representation is directly compatible with benchmarks such as WebArena, leveraging leaderboard runs and newly collected trajectories without modifying environments. The framework canonically encodes actions, merges recurring behaviors, and applies structural analyses including reward propagation and success-weighted edge statistics. Evaluations across thousands of trajectories from six web agents show that the graph abstraction captures cross-model regularities, highlights redundancy and inefficiency, and identifies critical decision points overlooked by outcome-based metrics. By framing web interaction as graph-structured data, WebGraphEval establishes a general methodology for multi-path, cross-agent, and efficiency-aware evaluation of web agents.

machine learning, natural language, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2510.19205

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Large-scale User Game Lifecycle Representation Learning

Gou, Yanjie, Liu, Jiangming, Xue, Kouying, Hu, Yi

arXiv.org Artificial IntelligenceOct-21-2025

However, existing representation learning methods crafted for handling billions of items in recommendation systems are unsuitable for game advertising and recommendation. This is primarily due to game sparsity, where the mere hundreds of games fall short for large-scale user representation learning, and game imbalance, where user behaviors are overwhelmingly dominated by a handful of popular games. To address the sparsity issue, we introduce the User Game Lifecycle (UGL), designed to enrich user behaviors in games. Additionally, we propose two innovative strategies aimed at manipulating user behaviors to more effectively extract both short and long-term interests. To tackle the game imbalance challenge, we present an Inverse Probability Masking strategy for UGL representation learning. The offline and online experimental results demonstrate that the UGL representations significantly enhance model by achieving a 1.83% AUC offline increase on average and a 21.67% CVR online increase on average for game advertising and a 0.5% AUC offline increase and a 0.82% ARPU online increase for in-game item recommendation.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.15412

Country: Asia > China (0.67)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology > Software (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping

Wang, Ziyi, Lu, Yuxuan, Zhang, Yimeng, Huang, Jing, Wang, Dakuo

arXiv.org Artificial IntelligenceOct-21-2025

Simulating step-wise human behavior with Large Language Models (LLMs) has become an emerging research direction, enabling applications in various practical domains. While prior methods, including prompting, supervised fine-tuning (SFT), and reinforcement learning (RL), have shown promise in modeling step-wise behavior, they primarily learn a population-level policy without conditioning on a user's persona, yielding generic rather than personalized simulations. In this work, we pose a critical question: how can LLM agents better simulate personalized user behavior? We introduce Customer-R1, an RL-based method for personalized, step-wise user behavior simulation in online shopping environments. Our policy is conditioned on an explicit persona, and we optimize next-step rationale and action generation via action correctness reward signals. Experiments on the OPeRA dataset emonstrate that Customer-R1 not only significantly outperforms prompting and SFT-based baselines in next-action prediction tasks, but also better matches users' action distribution, indicating higher fidelity in personalized behavior simulation.

large language model, natural language, simulation, (18 more...)

arXiv.org Artificial Intelligence

2510.0723

Genre: Research Report (1.00)

Industry:

Retail > Online (0.72)
Information Technology > Services > e-Commerce Services (0.72)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Grounded Reinforcement Learning: Learning to Win the Game under Human Commands Supplementary Materials

Neural Information Processing SystemsAug-14-2025, 04:22:46 GMT

In this section, we describe the details of MiniRTS Environment and human dataset. "spearman" but is retrained by "cavarly". "swordman", "spearman" and "cavalry" all are Figure 2: Building units can produce different army units using resources. "workshop" can produce "archer", "dragon" and "catapult" while other Resource Units: Resource units are stationary and neutral. Resource units cannot be constructed by anyone and are created at the beginning of a game. Building Units: MiniRTS supports 6 different building unit types.

action type, army unit, dragon, (17 more...)

Neural Information Processing Systems

Industry:

Leisure & Entertainment > Games (0.95)
Government > Military > Army (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning

Zhang, Yimeng, Wang, Tian, Gesi, Jiri, Wang, Ziyi, Lu, Yuxuan, Lin, Jiacheng, Zhan, Sinong, Gao, Vianne, Jiao, Ruochen, Liu, Junze, Qian, Kun, Tang, Yuxin, Xue, Ran, Zhang, Houyu, Cui, Qingjun, Guo, Yufan, Wang, Dakuo

arXiv.org Artificial IntelligenceJul-25-2025

Large Language Models (LLMs) have recently demonstrated strong potential in generating 'believable human-like' behavior in web environments. Prior work has explored augmenting training data with LLM-synthesized rationales and applying supervised fine-tuning (SFT) to enhance reasoning ability, which in turn can improve downstream action prediction. However, the performance of such approaches remains inherently bounded by the reasoning capabilities of the model used to generate the rationales. In this paper, we introduce Shop-R1, a novel reinforcement learning (RL) framework aimed at enhancing the reasoning ability of LLMs for simulation of real human behavior in online shopping environments Specifically, Shop-R1 decomposes the human behavior simulation task into two stages: rationale generation and action prediction, each guided by distinct reward signals. For rationale generation, we leverage internal model signals (e.g., logit distributions) to guide the reasoning process in a self-supervised manner. For action prediction, we propose a hierarchical reward structure with difficulty-aware scaling to prevent reward hacking and enable fine-grained reward assignment. This design evaluates both high-level action types and the correctness of fine-grained sub-action details (attributes and values), rewarding outputs proportionally to their difficulty. Experimental results show that our method achieves a relative improvement of over 65% compared to the baseline.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.17842

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Services > e-Commerce Services (0.71)
Retail > Online (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Filters

Collaborating Authors

action type

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Appendix for: Barking up the right tree: an approach to search over molecule synthesis DAGs John Bradshaw

Grounded ReinforcementLearning: LearningtoWintheGameunderHumanCommands SupplementaryMaterials

See, Think, Act: Online Shopper Behavior Simulation with VLM Agents

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation

Large-scale User Game Lifecycle Representation Learning

Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping

4cc05b35c2f937c5bd9e7d41d3686fff-Supplemental.pdf

Grounded Reinforcement Learning: Learning to Win the Game under Human Commands Supplementary Materials

Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning